35 research outputs found

    Learning Explicit and Implicit Arabic Discourse Relations.

    Get PDF
    We propose in this paper a supervised learning approach to identify discourse relations in Arabic texts. To our knowledge, this work represents the first attempt to focus on both explicit and implicit relations that link adjacent as well as non adjacent Elementary Discourse Units (EDUs) within the Segmented Discourse Representation Theory (SDRT). We use the Discourse Arabic Treebank corpus (D-ATB) which is composed of newspaper documents extracted from the syntactically annotated Arabic Treebank v3.2 part3 where each document is associated with complete discourse graph according to the cognitive principles of SDRT. Our list of discourse relations is composed of a three-level hierarchy of 24 relations grouped into 4 top-level classes. To automatically learn them, we use state of the art features whose efficiency has been empirically proved. We investigate how each feature contributes to the learning process. We report our experiments on identifying fine-grained discourse relations, mid-level classes and also top-level classes. We compare our approach with three baselines that are based on the most frequent relation, discourse connectives and the features used by Al-Saif and Markert (2011). Our results are very encouraging and outperform all the baselines with an F-score of 78.1% and an accuracy of 80.6%

    Combinación varias Características para evaluar el contenido del resumen de texto

    Get PDF
    In this paper, we propose a method that evaluates the content of a text summary using a machine learning approach. This method operates by combining multiple features to build models that predict the PYRAMID scores for new summaries. We have tested several single and "Ensemble Learning" classifiers to build the best model. The evaluation of summarization system is made using the average of the scores of summaries that are built from each system. The results show that our method has achieved good performance in predicting the content score for a summary as well as for a summarization system.En este artículo proponemos un método que evalúa el contenido de un resumen de texto utilizando un enfoque de aprendizaje automático. Este método funciona combinando múltiples Características para construir modelos que predicen las puntuaciones PYRAMID para nuevos resúmenes. Hemos probado varios clasificadores individuales y "Ensemble Learning" para construir el mejor modelo. La evaluación del sistema de resumen se realiza utilizando el promedio de las puntuaciones de los resúmenes que se construyen a partir de cada sistema. Los resultados muestran que nuestro método ha logrado un buen rendimiento en la predicción de la puntuación de contenido para un resumen, así como para un sistema de resumen

    Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries

    Get PDF
    In this article, we propose a method of text summary\u27s content and linguistic quality evaluation that is based on a machine learning approach. This method operates by combining multiple features to build predictive models that evaluate the content and the linguistic quality of new summaries (unseen) constructed from the same source documents as the summaries used in the training and the validation of models. To obtain the best model, many single and ensemble learning classifiers are tested. Using the constructed models, we have achieved a good performance in predicting the content and the linguistic quality scores. In order to evaluate the summarization systems, we calculated the system score as the average of the score of summaries that are built from the same system. Then, we evaluated the correlation of the system score with the manual system score. The obtained correlation indicates that the system score outperforms the baseline scores

    Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries

    Full text link

    Reconocimiento de acto de diálogo secuencial para debates argumentativos árabes

    Get PDF
    Dialogue act recognition remains a primordial task that helps user to automatically identify participants’ intentions. In this paper, we propose a sequential approach consisting of segmentation followed by annotation process to identify dialogue acts within Arabic politic debates. To perform DA recognition, we used the CARD corpus labeled using the SADA annotation schema. Segmentation and annotation tasks were then carried out using Conditional Random Fields probabilistic models as they prove high performance in segmenting and labeling sequential data. Learning results are notably important for the segmentation task (F-score=97.9%) and relatively reliable within the annotation process (f-score=63.4%) given the complexity of identifying argumentative tags and the presence of disfluencies in spoken conversations.El reconocimiento del acto de diálogo sigue siendo una tarea primordial que ayuda al usuario a identificar automáticamente las intenciones de los participantes. En este documento, proponemos un enfoque secuencial que consiste en la segmentación seguida de un proceso de anotación para identificar actos de diálogo dentro de los debates políticos árabes. Para realizar el reconocimiento DA, utilizamos el corpus CARD etiquetado utilizando el esquema de anotación SADA. Las tareas de segmentación y anotación se llevaron a cabo utilizando modelos probabilísticos de Campos aleatorios condicionales, ya que demuestran un alto rendimiento en la segmentación y el etiquetado de datos secuenciales. Los resultados de aprendizaje son especialmente importantes para la tarea de segmentación (F-score = 97.9%) y relativamente confiables dentro del proceso de anotación (f-score = 63.4%) dada la complejidad de identificar etiquetas argumentativas y la presencia de disfluencias en las conversaciones habladas

    Segmentation de textes arabes en unités discursives minimales

    Get PDF
    La segmentation d'un texte en Unités Discursives Minimales (UDM) a pour but de découper le texte en segments qui ne se chevauchent pas. Ces segments sont ensuite reliés entre eux afin de construire la structure discursive d'un texte. La plupart des approches existantes utilisent une analyse syntaxique extensive. Malheureusement, certaines langues ne disposent pas d'analyseur syntaxique robuste. Dans cet article, nous étudions la faisabilité de la segmentation discursive de textes arabes en nous basant sur une approche d'apprentissage supervisée qui prédit les UDM et les UDM imbriqués. La performance de notre segmentation a été évaluée sur deux genres de corpus: des textes de livres de l'enseignement secondaire et des textes du corpus Arabic Treebank. Nous montrons que la combinaison de traits typographiques, morphologiques et lexicaux permet une bonne reconnaissance des bornes de segments. De plus, nous montrons que l'ajout de traits syntaxiques n'améliore pas les performances de notre segmentation

    Détection automatique de l'ironie dans les tweets en français

    Get PDF
    International audienceCet article présente une méthode par apprentissage supervisé pour la détection de l'ironie dans les tweets en français. Un classifieur binaire utilise des traits de l'état de l'art dont les performances sont reconnues, ainsi que de nouveaux traits issus de notre étude de corpus. En particulier, nous nous sommes intéressés à la négation et aux oppositions explicites/implicites entre des expressions d'opinion ayant des polarités différentes. Les résultats obtenus sont encourageants

    Arabic QA4MRE at CLEF 2012: Arabic Question Answering for Machine Reading Evaluation

    Full text link
    This paper presents the work carried out at ANLP Research Group for the CLEF-QA4MRE 2012 competition. This year, the Arabic language was introduced for the first time on QA4MRE lab at CLEF whose intention was to ask questions which require a deep knowledge of individual short texts and in which systems were required to choose one answer from multiple answer choices, by analyzing the corresponding test document in conjunction with background collections. In our participation, we have proposed an approach which can answer questions with multiple answer choices from short Arabic texts. This approach is constituted essentially of shallow information retrieval methods. The evaluation results of the running submitted has given the following scores: accuracy calculated overall all questions is 0.19 (i.e., 31 correct questions answered correctly among 160), while overall c@1 measure is also 0.19. The overall results obtained are not enough satisfactory comparing to the top works realized last year in QA4MRE lab. But as a first step at the roadmap of the evolution of the QA to Machine Reading (MR) systems in Arabic language and with the lack of researches investigated in the MR and deep knowledge reasoning in Arabic language, it is an encouraging step. Our proposed approach with its shallow criterion has succeeded to obtain the goal fixed at the beginning which is: select answers to questions from short texts without required enough external knowledge and complex inference.Trigui, O.; Hadrich Belguith, L.; Rosso, P.; Ben Amor, H.; Gafsaoui, B. (2012). Arabic QA4MRE at CLEF 2012: Arabic Question Answering for Machine Reading Evaluation. CELCT. http://hdl.handle.net/10251/46315
    corecore